Model description


Discrete time model equations

The age structured viral dynamics model, with all potential transitions is stated in the equations below, with parameters corresponding to table 1 (currently in main text of paper), transitions between states occurs in discrete time \((t\to t+1)\), in each model discrete time setps \((t)\) are set to 0.25 days.

\(S_{N}(t+1)=S_{N}(t) + b(t)(S_F(t)+I_F(t))-(\omega_m+m_j\frac{N}{\kappa}+\beta(I_J(t)+I_N(t)+I_{F}(t)+I_{M}(t)))S_N(t)+\omega R_N(t)\)


\(S_{J}(t+1)=S_{J}(t)+\omega_m(S_N(t)+Ma)-(\mu+m_j\frac{N}{\kappa}+\beta(I_J(t)+I_N(t)+I_{F}(t)+I_{M}(t)))S_J(t)+\omega R_J(t)\)


\(S_{M}(t+1)=S_{M}+(t)\mu \frac{S_{J}}{2}-(m \frac{N}{\kappa}+\beta(I_J(t)+I_N(t)+I_{F}(t)+I_{M}(t))) S_{M}(t) I_M(t)+\omega R_M(t)\)


\(S_{F}(t+1)=S_{F}(t)+\mu \frac{S_{J}}{2}-(m \frac{N}{\kappa}+\beta(I_J(t)+I_N(t)+I_{F}(t)+I_{M}(t))) S_{F}(t)+\omega R_F(t)\)


\(L_{N}(t+1)= L_{N}(t)-( \omega_m+m_j\frac{N}{\kappa}+\epsilon)L_N+\rho I_N\)


\(L_{J}(t+1)= L_{J}(t) + \omega_mL_N(t) - (\mu +m_j\frac{N}{\kappa}+\epsilon)L_J(t)+\rho L_J(t)\)


\(L_{M}(t+1)= L_{M}(t) - (\mu +m_j\frac{N}{\kappa}+\epsilon)L_M(t)+\rho L_M(t)\)


\(L_{F}(t+1)= L_{F}(t) - (\mu +m_j\frac{N}{\kappa}+\epsilon)L_F(t)+\rho L_F(t)\)


\(I_{N}(t+1)= I_{J}(t) - (\omega_m+m_j\frac{N}{\kappa}+\gamma+\rho)I_J+\beta(I_J(t)+I_N(t)+I_{F}(t)+I_{M}(t)))S_N +\epsilon L_N\)


\(I_{J}(t+1)= I_{J}(t) - (\mu+m_j\frac{N}{\kappa}+\gamma+\rho)I_J+\beta(I_J(t)+I_N(t)+I_{F}(t)+I_{M}(t)))S_J +\epsilon L_J\)


\(I_{F}(t+1)=I_{F}(t)-(m \frac{N}{\kappa}\gamma+\rho) I_{F}(t)+\beta (I_{F}(t)+I_{M}(t))S_{F}(t)+\epsilon L_{F}(t)\)


\(I_{M}(t+1)=I_{M}(t)-(m \frac{N}{\kappa}+\gamma+\rho) I_{M}(t)+\beta (I_{M}(t)+I_{F}(t))S_{M}(t)+\epsilon L_{M}(t)\)


\(R_N(t+1) = R_N(t)-(\omega_m +m_j\frac{N}{\kappa}+\omega)R_N(t)\)


\(R_J(t+1) = R_J(t)+\omega_m R_N(t) -(\mu+m_j\frac{N}{\kappa}+\omega)R_J(t)\)


\(R_M(t+1) = R_M(t)+\mu \frac{R_J(t)}{2} -(m\frac{N}{\kappa}+\omega)R_M(t)+\gamma I_M(t)\)


\(R_F(t+1) = R_F(t)+\mu \frac{R_J(t)}{2} -(m\frac{N}{\kappa}+\omega)R_F(t)+\gamma I_F(t)\)

\(M_a(t+1)=M_a(t) +b(R_F(t)+L_F(t))-(\omega_m+m_j\frac{N}{\kappa})M_a(t)\)

Births occur in seasonal pulses, as previously described (Peel et al. 2014), with timing, amplitude and seasonality derived from three parameters in a modified Gaussian function.

\[\begin{align} b(t) = c\exp^{s \cos^2(\pi t - \phi)} \end{align}\]

\(\beta\) is derived from a rearrangement of equation (2) in the main text, here \(N\) is considered an upper limit to the population, thus \(N\) is equal to the carrying capacity \(\kappa\): \[\begin{align} \beta = R_0\bigg(\frac{(\epsilon+m)(\gamma+m+\rho)-\epsilon\rho}{N(\epsilon+m)}\bigg) \end{align}\]

Stochastic transitions between states

Movement between states at each discrete time-step (set as 0.25 days) for individuals is a stochastic process, using a combination of binomial, and multinomial distributions where multiple transition possibilities occur.

e.g. for bats exiting the variable state \(R\), transitions will occur with the following steps:

Model fitting

The models are considered state-space (hidden Markov) models, with observations \(k\) and hidden state variables \(Z\) at time \(t\) and we seek to identify an unbiased estimate of the likelihood value, (also known as a marginal likelihood) \(pr(k|\theta)\), by essentially “marginalising out” (\(Z\)). Here we do this using a particle markov chain monte carlo (pMCMC) algorithm. Which combines a particle filter as an unbiased estimator of the marginal likelihood value, and an MCMC acceptence/rejection algorithm (Andrieu, Doucet, and Holenstein 2010).

pMCMC generic example

Particle filters act as an unbiased estimator of marginal likelihood by applying a form of importance re-sampling, to generate an approximate sample from, and make inferences about an unobserved Markov process (Smith 2013). They are becoming popular in disciplines where state-space models are common such as ecology and epidemiology (Kantas et al. 2015; Peters, Hosack, and Hayes 2010; Knape and De Valpine 2012; Fasiolo et al. 2016; Sheinson, Niemi, and Meiring 2014).

In brief, the marginal likelihood \(pr(k|\theta)\) for the observed data \(k\), the hidden state \(Z\) and the parameter vector \(\theta\) is considered the joint posterior probability \(pr(k|Z,\theta)*p(Z|\theta)\). Using Monte-Carlo approximation for \(p(k|\theta)\), a particle filter with \(j\) particles which have the possible trajectories/evolution of \(Z_j\), the marginal likelihood can be considered as \(pr(k | \theta) \approx \sum_{J} pr\left(k | Z_{j}, \theta\right) * pr\left(Z_{j} | \theta\right)\).

To run the particle filter algorithm requires the following steps:

  1. Initialise the particles with equal weights. \[\begin{align} \begin{array}{l}{Z_{j} \sim pr\left(Z_{j} | \theta\right)} \\ {w_{j}=\frac{1}{J}}\end{array} \end{align}\]

  2. For each particle j at time t, simulate the initial conditions at the first observed data point. \[\begin{align} Z_{j t} \sim pr\left(Z_{j t} | Z_{j}, \theta\right) \end{align}\]

  3. Calculate a probability weighting for each particle based on the results of the simulation and the observed data. \[\begin{align} w_{j t}=pr\left(k_{t} | Z_{j}, \theta\right) \end{align}\]

  4. The marginal likelihood for this data point can be considered the average of each particle weighting. \[\begin{align} p\left(k_{t} | \theta\right)=\frac{1}{J} \sum_{J} w_{j t} \end{align}\]

  5. Normalise the weightings \(\frac{w_{jt}}{(\sum_Jw_{jt} )}\), resample with replacement each particle based on their weighting and simulate forward to the next observed data point. \[\begin{align} Z_{j t+1} \sim pr\left(Z_{j t+1} | Z_{t}, \theta\right) \end{align}\]

Repeat steps 3 to 5 for all observed data, an estimate for the total marginal likelihood value can be considered the product of the average particle weights at each observed timepoint. \[\begin{align} \mathcal{L} (\theta|k_{1: n} )=\displaystyle\prod_{t=1}^n \left(\frac{1}{J} \displaystyle\sum_{j=1}^J w_{jt}\right) \end{align}\]

This marginal likelihood value is then used in a traditional MCMC algorithm, in this instance Metropolis-Hastings, where rejection or acceptance of a parameter proposal is based on the current and previous marginal likelihood values.

Model fitting with pMCMC to Boonah bat data

To fit the model with pMCMC to our observed data, we developed a likelihood function which incorporated the three unique longitudinal data-types; Individual serology for Hendra virus antibodies, individual PCR of urine for Hendra virus RNA, and under roost PCR of urine for Hendra virus RNA.

Below we define how each data type contributes to the likelihood function in detail:

Individual serological and PCR of urine samples

In the observed data, individual bats are found to be in all four empirical states of serological and PCR positivity:

Logically, a relationship between the PCR and serology states is likely and so we consider a joint conditional probability function rather than treating the data as independent. Transitory states and test detection failure are also considered by fitting the coefficients \(\zeta_s\) and \(\zeta_p\) which are bounded between 0 and 1. Thus, the simulated implementation of each observed empirical state are as follows:

If \(k\) is a vector of the observed counts of bats in each empirical state, \(k=(k^{P^+S^+},k^{P^-S^+},k^{P^+S^-},k^{P^-S^-})\), \(z\) is a vector of the simulated prevalences of each state \(z=(z^{P^+S^+},z^{P^-S^+},z^{P^+S^-},z^{P^-S^-})\) and \(N\) is the total observed bats, we can then propose a joint probability function based on a multinomial distribution, \(pr(k^p,k^s|Z,N)\).

\[\begin{align} pr(k^p,k^s|Z) = \frac{N!}{\prod_{j=1}^{4}k_j!} \displaystyle\prod_{j=1}^{4}z_j^{k_j} \end{align}\]

Pooled under roost samples

If the probability of at least one bat within a pooled urine sample of \(x\) individuals being PCR positive for virus RNA can be considered \(P_t=1-(1-p)^x\), where \(p\) is the probability of each individual bat contributing to a pool being positive (Chiang and Reeves 1962). The result of PCR testing on a pooled sample (\(T_i\)) will be one of two states, with the following probabilities: \[\begin{align} T_i=\left\{\begin{array}{l}positive = 1-(1-p)^x\\ negative = (1-p)^x\end{array}\right. \end{align}\]

Therefore, assuming that any one under-roost sample comes from \(x\) bats with a prevalence of \(p\), considering the model state \(Z(t)\), we aim to identify the likelihood of the contribution of positive under-roost samples at time (\(t\)) given (from the predicted GAM values) the number of samples \(N^u(t)\), of which \(k^u(t)\) were positive, \(pr(k^u|N^u,Z)\). As the predicted values from the GAM are for prevalence, the total number of bats (\(N^u(t)\)) was considered the mean number of samples collected on each sampling occasion.

Due to the nature and challenges of field-sampling, under-roost data is likely to contain a degree of overdispersion, as it is highly exposed to external factors which are difficult to account for in study design. Therefore, the probability is unlikely to follow a standard binomial distribution, where the variance is defined by the mean. To account for this, we used a probability function based around a beta-binomial distribution, which allows for an additional variance parameter to be fitted to account for any overdispersion in the observations. A beta-binomial distribution, is a binomial distribution such \(pr(k^u|N^u,Z)\) would be:

\[\begin{align} pr(k^u|N^u,Z)={N^i \choose k^u}p_t^{k^u}(1-p_t)^{N^u-k^u} \end{align}\]

However, \(p_t\) is not constant and is generated from a beta distribution, which takes two shape parameters \(\alpha\) and \(\beta\), \(beta(\alpha,\beta)\) thus;

\[\begin{align} pr(k^u|N^u,Z) = \frac{{N^u \choose k^u}beta(k^u+O_up, N^u-k^u+O_u(1-p_t))}{beta(O_up, O_u(1-p_t))} \end{align}\]

Here, \(O_u\) accounts for overdispersion in the data, and the prevalence of infectious bats \(p\) in \(p_t\) is derived from the model state \(Z(t)\) as \(\frac{I(t)\zeta_u}{N(t)}\), where \(\zeta_u\) is a fitted coefficient accounting for detection failure.

As there is no precise measure of exact bat numbers contributing to a pooled sample, \(x\) is a fitted parameter with a prior based on expert knowledge and estimates from the field. For simplicity, only one value of \(x\) is derived for all pools on a single sampling occasion, assuming that each pooled sample on average has the same number of bats contributing. We also assume that there is no effect of pooling on the diagnostic tests, and any that does occur should be primarily accounted for by the \(\zeta_u\) coefficient.

full likelihood function

Considering the above, and the sampling time points (\(i\)), we can define the full likelihood function as:

\[\begin{align} \mathcal{L} (\theta|N,N^u,k^{(s,p,u)}) =pr(\theta)* \displaystyle\prod_{i = 1}^{16} pr(k^s_i,k^p_i|N_i,Z_i)*pr(k^u_t|N^u_i,Z_i) \end{align}\]


Table 1: Informative priors are based on normal distributions and are shown with 95% credible intervals in brackets, uninformative priors use a uniform distribution and are shown as a minimum and maximum value
Parameter Prior Description
\(S\)
Number of susceptible bats
\(I\)
Number of infectious bats
\(R\)
Number of recovered bats
\(L\)
Number of latently infected bats
\(N\)
Total population of bats
\(\theta\)
Parameter vector
\(Z\)
System state
\(z\)
Vector of simulated individual bat states
\(\zeta_s\) 0-1 Individual serology coefficient
\(\zeta_p\) 0-1 Individual PCR coefficient
\(\zeta_u\) 0-1 Under roost PCR coefficient
\(k^s\)
Observed individual seropositive bats
\(k^p\)
Observed individual PCR positive bats
\(k^u\)
Predicted no. positive bats contributing to pooled under roost samples
\(k\)
Vector of observed individual bat states
\(N^u\)
Number of pooled under-roost samples (mean)
\(p\)
Probability of bat being positive (infection prevalence)
\(P_t\)
Probability of pooled sample being positive
\(O_u\) >0 Overdispersion parameter for under roost data
\(x\) 0-10 Number of bats contributing to each pooled under roost sample
\(T_i\)
Pooled under roost sample test state
\(\beta\) > 0 Infection rate
\(\gamma\) > 1 day Recovery rate
\(\rho\) > 1 day Latency rate
\(\epsilon\) > 1 day Recurrence rate
\(m\) 0.186 (0.146-0.225) year\(^{-1}\) Adult mortality rate
\(m_j\) 0.500 (0.480-0.520) year\(^{-1}\) Juvenile mortality rate
\(b\)
Birth rate
\(c\)
Birth pulse scalar
\(s\) 130 (111-150) Birth pulse synchronicity
\(\phi\) 7.180 (6.787-7.571) Birth pulse timing
\(\mu\) 1.37 year\(^{-1}\) Maturation rate
\(\omega_m\) 0.800 (0.741-0.859) year\(^{-1}\) Maternal antibody waning rate
\(\omega\) > 1 day Antibody waning rate
\(R_0\) > 1 \(R_{0}\)
\(d\) 0-25 Pooled sample contributing bats
\(c_ u\) 1 Env. force scalar
\(s_ u\)
Env. force synchronicity
\(\phi_ u\)
Env. force timing
\(\kappa\) 4000 (2570-6300) Environmental carrying capacity


Initial conditions

Starting parameters are estimated by running a Metropolis-Hastings MCMC algorithm with a deterministic version of the model for 50,000 iterations with a 10,000 iteration burn-in. The median parameter set from each posterior distribution was then used in the pMCMC.

Initial conditions for the bat population are derived from the carrying capacity parameter (\(\kappa\)). A population equal to the carrying capacity is first assumed and this is then split into demographic compartments as follows:

\[\begin{align} N=\kappa \end{align}\]

\[\begin{align} b=(\omega_m + m_j)\frac{m_j+\mu}{\omega_m} \end{align}\]

\[\begin{align} S_F = 0.5 \biggl(\frac{N}{\frac{1+m}{\mu}+\frac{b}{\omega_m+m_j}}\biggr) \end{align}\]

\[\begin{align} S_M = 0.5 \biggl(\frac{N}{\frac{1+m}{\mu}+\frac{b}{\omega_m+m_j}}\biggr) \end{align}\]

\[\begin{align} S_J = (S_F+S_M) \frac{m}{mu} \end{align}\]

\[\begin{align} S_N = N - S_M - S_F - S_J \end{align}\]

Infectious and exposed individuals are added as 5% of the population and the model is run for 50 years to reach an equilibrium before fitting to the first data point.

Example output


Figure 1: 100 repeat runs of the maternal immunity SILI model without seasonal forcing (model 2 in table 1), parameterised using the median values from the posterior estimates of each parameter. Dark blue points show the observed data, purple points show the model outputs for each run of the model at the corresponding observed time period. The top figure shows the simulated and observed serological prevalence, the middle figure shows the simulated and observed virus RNA prevalence from individual urine samples, and the bottom figure shows the simulated and predicted (using observed data and contributing bats parameter $d$) virus RNA prevalence from the under-roost samples. Confidence intervals on observed data are calculated as binomial confidence intervals, with a beta prior on the binomial distribution; the shape of the beta prior for individual samples is uninformitive and for under-roost data is derived from the corresponding fitted overdispersion parameters for each data-type.

Figure 1: 100 repeat runs of the maternal immunity SILI model without seasonal forcing (model 2 in table 1), parameterised using the median values from the posterior estimates of each parameter. Dark blue points show the observed data, purple points show the model outputs for each run of the model at the corresponding observed time period. The top figure shows the simulated and observed serological prevalence, the middle figure shows the simulated and observed virus RNA prevalence from individual urine samples, and the bottom figure shows the simulated and predicted (using observed data and contributing bats parameter \(d\)) virus RNA prevalence from the under-roost samples. Confidence intervals on observed data are calculated as binomial confidence intervals, with a beta prior on the binomial distribution; the shape of the beta prior for individual samples is uninformitive and for under-roost data is derived from the corresponding fitted overdispersion parameters for each data-type.



Figure 2: 100 repeat runs of the maternal immunity SILI model with seasonal forcing (model 4 in table 1), parameterised using the median values from the posterior estimates of each parameter. Dark blue points show the observed data, purple points show the model outputs for each run of the model at the corresponding observed time period. The top figure shows the simulated and observed serological prevalence, the middle figure shows the simulated and observed virus RNA prevalence from individual urine samples, and the bottom figure shows the simulated and predicted (using observed data and contributing bats parameter $d$) virus RNA prevalence from the under-roost samples. Confidence intervals on observed data are calculated as binomial confidence intervals, with a beta prior on the binomial distribution; the shape of the beta prior for individual samples is uninformitive and for under-roost data is derived from the corresponding fitted overdispersion parameters for each data-type.

Figure 2: 100 repeat runs of the maternal immunity SILI model with seasonal forcing (model 4 in table 1), parameterised using the median values from the posterior estimates of each parameter. Dark blue points show the observed data, purple points show the model outputs for each run of the model at the corresponding observed time period. The top figure shows the simulated and observed serological prevalence, the middle figure shows the simulated and observed virus RNA prevalence from individual urine samples, and the bottom figure shows the simulated and predicted (using observed data and contributing bats parameter \(d\)) virus RNA prevalence from the under-roost samples. Confidence intervals on observed data are calculated as binomial confidence intervals, with a beta prior on the binomial distribution; the shape of the beta prior for individual samples is uninformitive and for under-roost data is derived from the corresponding fitted overdispersion parameters for each data-type.


References

Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov Chain Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (3): 269–342.

Chiang, Chin-long, and William C et al. Reeves. 1962. “Statistical Estimation of Virus Infection Rates in Mosquito Vector Populations.” American Journal of Hygiene 75 (3).

Fasiolo, Matteo, Natalya Pya, Simon N Wood, and others. 2016. “A Comparison of Inferential Methods for Highly Nonlinear State Space Models in Ecology and Epidemiology.” Statistical Science 31 (1): 96–118.

Kantas, Nikolas, Arnaud Doucet, Sumeetpal S Singh, Jan Maciejowski, Nicolas Chopin, and others. 2015. “On Particle Methods for Parameter Estimation in State-Space Models.” Statistical Science 30 (3): 328–51.

Knape, Jonas, and Perry De Valpine. 2012. “Fitting Complex Population Models by Combining Particle Filters with Markov Chain Monte Carlo.” Ecology 93 (2): 256–63.

Peel, Alison J, JRC Pulliam, AD Luis, RK Plowright, TJ O’Shea, DTS Hayman, JLN Wood, CT Webb, and O Restif. 2014. “The Effect of Seasonal Birth Pulses on Pathogen Persistence in Wild Mammal Populations.” Proceedings of the Royal Society B: Biological Sciences 281 (1786): 20132962.

Peters, Gareth W, Geoff R Hosack, and Keith R Hayes. 2010. “Ecological Non-Linear State Space Model Selection via Adaptive Particle Markov Chain Monte Carlo (Adpmcmc).” arXiv Preprint arXiv:1005.2238.

Sheinson, Daniel M, Jarad Niemi, and Wendy Meiring. 2014. “Comparison of the Performance of Particle Filter Algorithms Applied to Tracking of a Disease Epidemic.” Mathematical Biosciences 255: 21–32.

Smith, Adrian. 2013. Sequential Monte Carlo Methods in Practice. Springer Science & Business Media.